Trends in incidence, prevalence, and survival of breast cancer in the United Kingdom from 2000 to 2021

Breast cancer is the most frequently diagnosed cancer in females globally. However, we know relatively little about trends in males. This study describes United Kingdom (UK) secular trends in breast cancer from 2000 to 2021 for both sexes. We describe a population-based cohort study using UK primary care Clinical Practice Research Datalink (CPRD) GOLD and Aurum databases. There were 5,848,436 eligible females and 5,539,681 males aged 18+ years, with ≥ one year of prior data availability in the study period. We estimated crude breast cancer incidence rates (IR), prevalence and survival probability at one-, five- and 10-years after diagnosis using the Kaplan–Meier method. Analyses were further stratified by age. Crude IR of breast cancer from 2000 to 2021 was 194.4 per 100,000 person-years for females and 1.16 for males. Crude prevalence in 2021 was 2.1% for females and 0.009% for males. Both sexes have seen around a 2.5-fold increase in prevalence across time. Incidence increased with age for both sexes, peaking in females aged 60–69 years and males 90+ . There was a drop in incidence for females aged 70–79 years. From 2003–2019, incidence increased > twofold in younger females (aged 18–29: IR 2.12 in 2003 vs. 4.58 in 2018); decreased in females aged 50–69 years; and further declined from 2015 onwards in females aged 70–89 years. Survival probability for females after one-, five-, and ten-years after diagnosis was 95.1%, 80.2%, and 68.4%, and for males 92.9%, 69.0%, and 51.3%. Survival probability at one-year increased by 2.08% points, and survival at five years increased by 5.39% from 2000–2004 to 2015–2019 for females, particularly those aged 50–70 years. For males, there were no clear time-trends for short-term and long-term survival probability. Changes in incidence of breast cancer in females largely reflect the success of screening programmes, as rates rise and fall in synchronicity with ages of eligibility for such programmes. Overall survival from breast cancer for females has improved from 2000 to 2021, again reflecting the success of screening programmes, early diagnosis, and improvements in treatments. Male breast cancer patients have worse survival outcomes compared to females, highlighting the need to develop male-specific diagnosis and treatment strategies to improve long-term survival in line with females.


Ethical approval
The protocol for this research was approved by the Research Data Governance (RDG) Board of the Medicine and Healthcare products Regulatory Agency database research (protocol number 22_001843).

Results Patient populations and characteristics
There were 5,848,436 and 5,539,681 eligible female and male patients 18 years and older, with at least one year of data availability prior to diagnosis from January 2000 to December 2021 for CPRD GOLD.Attrition tables for this study can be found in the supplementary information (Supplement S2).A summary of study patient characteristics of those with a diagnosis of breast cancer stratified by sex is shown in Table 1.
Overall, the majority of those with breast cancer were female, with a median age of 63 years across both databases.Males only made up 0.6% of cancer diagnoses with an older median age of 70 years.In females, the highest percentage of patients were those aged 60-69 years, contributing to 26% of diagnosed patients, whereas for males, those aged 70-79 years contributed to 32% of diagnosed patients.Overall, males were more likely to have comorbidities compared to females apart from depressive disorders which were higher in females.The patient characteristics in Aurum can be found in the supplementary information (Supplement S3).

Overall and annualised crude incidence rates for study population stratified by age and sex across databases
Overall Crude Incidence Rates Table 2 shows the overall incidence rates (IR) of breast cancer stratified by age and sex.For females, the overall IR per 100,000 person-years (pys) of breast cancer from 2000 to 2021 was 194.4 (95% CI 193.1-195.7) in GOLD, with slightly lower results in Aurum (180.4;95% CI 179.5-181.3).For males, the overall IR was 1.16 (95% CI 1.07-1.28) in GOLD, with the same results in Aurum.When stratifying by age, the overall IR for females increased with age, peaking in those 60-69 years (IR: 381.0) before dropping in those aged 70-79 years (IR: 343.9), increasing in those aged 80-89 years (IR: 366.9), and with a final decrease in those 90+ years (IR: 357.6).This trend was similar in both databases.For males, overall IR was higher with increasing age up to 90+ years (IR: 6.7) in GOLD and up to 80-89 years (IR: 5.2) in Aurum.The biggest increase in overall IR for females was between those aged 30-39 years (IR: 38.3) to 40-49 years (IR: 143.7) with an increase in IR of 3.75-fold; whereas the biggest difference in IR for males was between those aged 50-59 (IR: 0.9) and 60-69 (IR: 2.2) years with a 2.43-fold increase.

Annualised crude incidence rates
For females, annualised IRs rapidly increased to 2004 before a sharp peak and gradual increase up to 2014 before declining (Fig. 1).In GOLD, IRs dropped in 2020 but recovered in 2021.For males, IRs gradually increased to a small extent over the study period but had high variability due to small sample numbers (Fig. 1).
When stratifying by age group, annualised IRs over the study period showed different trends in females depending on age of diagnosis (Fig. 2).For those aged 18 For those aged 90+ years, there were differences between the two databases with IRs in GOLD declining but with a peak in 2013 (474.0);whereas IRs in Aurum increased over the study period, peaking in 2018 (394.1).For all age groups, IRs decreased in 2020 before increasing in 2021, apart from those aged 30-39 years.For males, there were not enough cases per age group to assess trends in annualised IR across age groups apart from those aged 70-79 years, which shows the stability of IRs over the study period (Supplement S4).
All results for this study can be found and downloaded in a user-friendly interactive web application: https:// dpa-pde-oxford.shiny apps.io/ Breas tCanc erInc PrevS urvSh iny/.

Crude prevalence
In GOLD, the crude prevalence for breast cancer in 2021 was 2.11% (2.09%-2.14%)for females and 0.009% (0.008%-0.011%) for males, which is equivalent to 2,110 cases per million people for females, and 90 cases per million people for males.Similar prevalence was obtained in 2019 when comparing GOLD and Aurum across sexes.When stratifying by age, prevalence in GOLD in 2021 peaked in females aged 70-79 years (5.39%) and in males 90+ years (0.06%) with similar trends in 2019 when comparing both databases (Supplement S5).

Annualised crude prevalence
In GOLD, prevalence increased from 2000-2013 before stabilising to 2018 in females and declining to the end of the study period in males; whereas in Aurum, prevalence increased each year over the study period for females and males (Fig. 3).Both sexes have seen around 2.5-fold increase in PP across the study period in both databases (Fig. 3).The annual percent change was significantly different from zero across all timepoints, except for between 2013-2018 for males in CPRD GOLD (supplementary figures S6-S9).When stratifying by age, prevalence trends over time for females showed some differences per age group (Fig. 4).Those aged 40-49 and 60-69 years showed increases in prevalence over time until 2014 where prevalence stabilised to the end of the study period, with a small decline in those aged 40-49 years.For all other age groups prevalence increased over the study period.
For males, there again were differences in prevalence trends over the study period depending on age (Fig. 5).For those aged 40-49 years, prevalence was stable for most of the study period with an increase in prevalence from 2012/14.For those aged 50-79 years, prevalence increased over the study period in both databases.In GOLD, prevalence increased between 2000 and 2018 and declined thereafter; whereas in aurum, prevalence increased over time for those aged 80-89 years.for those aged 30-39 years and 90+ years there was not enough www.nature.com/scientificreports/data to assess trends in gold; however, in aurum prevalence trends indicated stability in those aged 30-39 years and an increase in prevalence over time in those aged over 90 years.

Overall survival probability for breast cancer population stratified by age, sex and calendar year
For females, there were 84,984 patients, 19,974 deaths and a median follow-up of 4.7 years; and for males, there were 505 patients, 173 deaths and a median follow-up of 3.8 years in GOLD (Fig. 6).Number of patients at risk, died and censored across the follow-up period is indicated in Supplement S10.Median survival was not reached for females within the specified follow-up period, whereas for males, median survival was between 10-11 years across both databases.What this means is that, on average, females lived beyond the duration of the follow-up period; whereas males survived for a period of time falling within the range of 10 to 11 years during the specified follow-up period.Indeed, survival probability for females after one-, five-, and ten-years after diagnosis was 95.1%, 80.2%, and 68.4%, and for males 92.9%, 69.0%, and 51.3% in GOLD, with similar results in Aurum.
Long-term survival probability at five-and ten-years was higher in females than males across both databases (Supplement S11).
For females, when stratifying by age group, for those aged 18-69 years median survival was not reached.For those aged older than 70 years, median survival decreased with increasing age.Median survival decreased from 11-12 years to 2.5 for those older than 90 years.For males, median survival was not achieved in those aged 40-69 and 90+ years.However, median survival was lower in those aged 80-89 years compared to those aged 70-79 years across both databases.
For females, one-year survival probability for those aged 18-69 years was similar (97-98%), peaking in those aged 40-59 years, and declining from 70 years (Table 3).After five-and ten-years, survival rates increased from 18-29 years peaking in those aged 50-59 years before declining.For short-and long-term survival probability for males, results indicate that those aged 80 years and older had lower short-and long-term survival probability compared to younger age groups, however, sample numbers were small.
To investigate if survival probability has changed over time, we stratified by calendar time of cancer diagnosis in five-year windows.Figure 7 shows the KM survival curves stratified by sex and calendar year.For females, short-and long-term survival probability increased over calendar time.Survival probability at one-year increased by 2.1%, and survival probability at five-years increased by 5.4% from 2000-2004 to 2015-2019 in GOLD (note that survival data stratified by calendar year was not available for Aurum).For males, when comparing survival probability between those diagnosed in 2000-2004 with those diagnosed in 2015-2019, there was no clear pattern over calendar time for short-term and long-term survival probability due to small sample numbers.
Supplement S12 shows the short-and long-term survival probabilities and 95% confidence intervals stratified by calendar year of diagnosis, age and sex.Short-term survival probability in the different age groups showed that in females those aged between 50-69 years had increases in survival probability over time when comparing those diagnosed in 2000-2004 to those diagnosed between 2015-2019 (one-year survival probability of 97.4% vs. 99.0% in those aged 50-59 years; and 95.6% vs. 97.8% in those aged 60-69 years from 2000-2004 vs. 2015-2019).There was a similar pattern of increased survival probability over time for long-term (five-year) survival probability in

Discussion
This study demonstrates trends of breast cancer incidence, prevalence and survival probability in the UK in females and males.Below is a summary of the key findings in the context of previous research.

Overall incidence and prevalence for study population stratified by age and sex
Overall crude incidence rates of breast cancer in females (IR: 194 per 100,000 person-years) and males (IR: 1.2 per 100,000 person-years) were in line with national statistics (IR: 166 and 1.1 for females and males, respectively, The drop in overall incidence for females aged 70-79 years observed here is likely to coincide with the ending of routine breast cancer screening in the UK (women are eligible for the breast cancer screening programme between the ages of 50 and 70 years 18 ).Thus, this is likely a compensatory decrease in incidence as screening has advanced the detection of cases among women in this age bracket 3 .This also explains why the incidence subsequently reverts to somewhat higher rates among those aged 80-89 years.Overall incidence in those 90+ years is lower than younger ages, which could indicate reduced diagnostic activity, perhaps due to general ill health in this age group.National data on prevalence of breast cancer is scarce.In this study prevalence of breast cancer at the end of the study period was 2.1% in females, peaking in those aged 70-79 years, and 0.009% in males, peaking in those aged 90+ years.That said, these could be overestimates of population prevalence as in this study anyone with a diagnosis of breast cancer was included until the end of their observation period.Patients with survival over 5-10 years who are discharged would still be contributing to the prevalence estimate.Furthermore, the increase of early-stage breast cancers in the context of screening programmes is likely to drive this overestimation further due to patients surviving longer.However, while many of these cases may be considered cured after five years and no longer being actively treated, people in this survivorship phase may have long-term medical needs and accordingly, it is important to provide accurate counts to allow for healthcare planning.

Trends in incidence and prevalence over time for females and males
In terms of trends over time, incidence increased for females across the study period before dropping dramatically during 2020 -coinciding with the COVID-19 pandemic -and returning to expected levels in 2021.Largely speaking, changes in incidence of breast cancer over time are likely explained by many factors, including increased screening and diagnostic activity, but also increases in risk factors in the population (such as obesity 19,20 ).Another time trend of note is the sharp spike in incidence in females in 2004-2005.One possible explanation for this is that the Quality and Outcomes Framework (QOF), which assesses performance of general practices on several key disease areas (including cancer), and provides financial incentives for achieving specified quality targets, was introduced in 2004 (of note, there were substantially more patients from Scotland with a date of diagnosis in April 2004, which is the start of QOF reporting period).Thus, screening, diagnostics and reporting of cancer diagnoses may have been greater during this time-period.Additionally, many screening units in the UK extended the screening of women aged 65-70 years between 2001 and 2004 21 , and it is possible that this extension accounted for the large spike observed.
When examining incidence trends over time by age group, three key findings are highlighted.First is the increase in incidence over the study period for younger women (aged 18-29; and 30-39).Several possible explanations include: increasing awareness of breast symptoms leading to more women being diagnosed; but also risk factor exposures in early life such as earlier age of thelarche (pubertal phase of breast development) and menarche than previous decades 22,23 , leading to increased cumulative exposure to oestrogen; and increased use of hormonal contraceptives which pose an elevated risk for breast cancer 24 .Second, the decline in incidence for women aged 50-69 years from 2005 to 2019 may be a reflection of the success of screening programmes; and third, the decline from 2015 onwards for women aged 70-89 years coincides with the launch of the "Be Clear on Cancer" campaign aimed at women in this age group 25 .
Prevalence of breast cancer in females and for most age groups in males increased across the study period, which is likely a reflection of increased survival due to the success of screening programmes (for females at least), early diagnosis and effective treatments 26,27 .

Differences in short-term and long-term survival probability in different age groups in females and males
Short-term (one-year) survival probability in females was similar across the age groups from 18-69 years; whereas long-term survival probability at five-years was low in younger age groups, and highest in those aged 50-59 years www.nature.com/scientificreports/likely due to the eligibility of females into national breast cancer screening programmes in the UK (which starts at 50 years of age) 18 .It should also be noted that women typically transition through menopause from age 50 years, and breast cancer that develops during menopause typically progresses more slowly and is less aggressive than earlier onset breast cancer, which may account for age-related differences in survival 28 .
Generally speaking, males have lower long-term survival compared to females which is in line with previous studies 29 .This could be due to several factors such as age and disease severity.Males tend to be older when diagnosed compared to females and it is likely that older males present with more comorbidities and medication use making treatment decisions more complex.Males also tend to present with more advanced disease likely due to the rarity of the disease and consequential delays in diagnosis 5,30 .Furthermore, as males are underrepresented in trials, treatment recommendations follow those for postmenopausal women 31 .Therefore, the current management of male breast cancer might not be ideal and could explain the lower long-term survival compared to females.Additionally, males lack breast cancer screening and it is clear that at least in females this has contributed www.nature.com/scientificreports/ a marked decrease in female breast cancer mortality since the late 1980s 32 .The National Comprehensive Cancer Network (NCCN) recommend that males aged 35 years and older with BRCA mutations receive self-examination training for breast cancer alongside annual clinical breast examination 33 .Yet, the sporadic nature of genetic testing for such mutations may impede the practical realisation of this recommendation.

Survival probability over calendar time for whole population and age strata
For females, one-year survival probability increased by 2.08% and five-year survival probability increased by 5.39% from 2000-2004 to 2015-2019.Improvements in survival over the past 20 years are echoed in data from the National Cancer Registration and Analysis Service which showed annual mortality rates of ~ 4% for females diagnosed between 1993-1999 reduced to around 1% for those diagnosed between 2010-2015.Similarly, 5-year    cumulative mortality risk reduced from 14.4% for females diagnosed between 1993-1999 to 4.9% for those diagnosed between 2010-2015 13 .In the current study, survival probability particularly improved for females aged 50-70 years, not surprisingly coinciding with eligibility into national screening programmes.Reassuringly, there did not appear to be an effect of the COVID-19 pandemic on short-term survival probability for those diagnosed between 2020-2021 compared to those diagnosed in the years prior.This is somewhat surprising, given data that suggests that screening, diagnosis and treatments were impacted by the pandemic [34][35][36] .
For males, both short-term and long-term survival probability did not show improvements across calendar periods, but this is likely driven by small sample sizes.Other data shows death rates in North-Western Europe decreased between 10%-40% from 2000-2004 to 2015-2017 14 , and so further data is required before we have clear evidence on the survival trends from male breast cancer in the UK.

Strengths and limitations
The main strength of this study is the use of two large primary care databases covering the whole of the UK.CPRD GOLD covers primary care practices from England, Wales, Scotland and Northern Ireland (with greater representation from Scotland rather than England), whereas CPRD Aurum covers primary care practices in England.The similarity between the results in both databases provides increased generalizability across the UK.Nevertheless, there were a few discrepancies in results between the two databases which can partly be explained by differences in observation period for patients across the study period.In GOLD, the number of people in the database steadily increased from 2000 up to 2006, then remained stable until 2011 before a gradual decline.This gradual decline is likely due to GP practices in England moving EMIS clinical systems.Furthermore, over time the demographic representation of GOLD has changed which could explain differences in results.The advent of the CPRD Aurum database saw some practices transferred from GOLD to Aurum.Across our observation period practices from England and Northern Ireland reduced, whilst practices from Scotland and Wales increased.
Another strength of our study is the inclusion of a complete study population database for the assessment of incidence and prevalence.In contrast, cancer registry studies extrapolate the registry data to the whole population using national population statistics, potentially introducing biases 16,17 .The high validity and completeness of mortality data with over 98% accuracy compared to national mortality records 37 allowed us to examine the impact of calendar time on overall survival probability-one of the key outcomes in cancer care.
Our study had limitations.First, we used primary care data without linkage to cancer registry potentially leading to misclassification and delayed recording of diagnoses.However, previous validation studies have shown high accuracy and completeness of cancer diagnoses in primary care records 38 .Second, our use of primary care records precluded us from studying tumour histology, genetic mutations, staging or cancer therapies, which can all impact breast cancer survival.Therefore, our survival probability estimates may overestimate survival in those with higher staging as well those with specific genetic mutations such as BRCA 1/2 39 .Other factors such as socio-economic status and ethnicity could also result in different values for incidence, prevalence and survival [40][41][42] .Third, in this study we calculated overall survival probability which does not differentiate between deaths caused by cancer vs. other causes.Therefore, it is a broad measure of overall survival probability rather than specifically cancer mortality.

Conclusion
Our study demonstrates that changes in incidence of breast cancer in females largely reflect the success of national breast cancer screening programmes, as rates rise and fall in synchronicity with ages of eligibility for such programmes.Overall survival probability for females from breast cancer has improved from 2000 to 2021, again reflecting the success of screening programmes, early diagnosis, and improvements in treatments.Male breast cancer patients, however, have worse survival outcomes compared with those of female patients.This highlights the need to develop male-specific treatment strategies and promote education and self-examination recommendations of breasts in males, given there are no screening programmes, to improve long-term survival in line with females.

Study design, setting, and data sources
We carried out a population-based cohort study using routinely collected primary care data from the United Kingdom (UK).People with a diagnosis of breast cancer and a background cohort were identified from Clinical Practice Research Datalink (CPRD) GOLD to estimate overall survival probability, incidence, and prevalence.We additionally carried out this study using CPRD Aurum and compared to the results in GOLD.Both these databases contain pseudo-anonymised patient-level information on demographics, lifestyle data, clinical diagnoses, prescriptions, and preventive care provided by GPs and collected by the NHS as part of their care and support.CPRD GOLD contains data from across the UK, representing 6.9% of the UK population 43 , whereas Aurum only contains data from England, but represents 13% of the population of England 44 .CPRD GOLD and Aurum have been mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to facilitate replication of analyses 45,46 .
The protocol for this research was approved by the Research Data Governance (RDG) Board of the Medicine and Healthcare products Regulatory Agency database research (protocol number 22_001843).The data is provided by patients to their GPs and collected by the NHS as part of their care and support, and so consent is provided to GPs prior to inclusion in this study.All methods were carried out in accordance with ethical principles outlined in the Declaration of Helsinki; and the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance.

Study participants and time at risk
All participants were required to be aged 18 years or older and have at least one year of data availability prior to diagnosis, and information on age and sex.For the incidence and prevalence analysis, the study cohort consisted of individuals present in the database from 1st January 2000.For CPRD GOLD, these individuals were followed up to whichever came first: diagnosis of breast cancer, exit from the database, date of death, or the 31st of December 2021 (the end of the study period), whereas for Aurum, the end of the study period was 31st of December 2019 (due to data availability).For the survival analysis, only individuals with a newly diagnosed cancer were included.These individuals were followed up from the date of their diagnosis to either date of death, exit from the database, or end of the study period.Any patients whose death date and cancer diagnosis date occurred on the same date were removed from the survival analysis.

Outcome definitions
We used Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) diagnostic codes to identify breast cancer events (see Supplementary Table S1).Diagnostic codes indicative of either non-malignant cancer or metastasis were excluded (apart from prevalence analyses), as well as diagnosis codes indicative of melanoma, sarcoma, lymphoma, and other tumors not originating from breast tissue.Note that prior diagnoses of other cancers were not excluded.The study outcome cancer definition was reviewed using the CohortDiagnostics R package 47 .This package was used to identify additional codes of interest and to remove those highlighted as irrelevant based on feedback from clinicians with oncology expertise through an iterative process during the initial stages of analysis.For survival analysis, mortality was defined as all-cause mortality based on records of date of death.Mortality data in CPRD GOLD has been previously validated and shown to be over 98% accurate 37 .

Statistical methods
The population characteristics of patients with a diagnosis of breast cancer were summarised on a range of comorbid conditions using standardised SNOMED codes, with median and interquartile range (IQR) used for continuous variables and counts and percentages used for categorical variables.
We calculated the overall and annualised crude incidence rates (IR) and annualised crude prevalence for breast cancer from 2000 to 2021.For incidence, the number of events, the observed time at risk, and the incidence rate per 100,000 person-years (pys) were summarised along with 95% confidence intervals.Annualised incidence rates were calculated as the number of incident cancer cases as the numerator and the recorded number of person-years in the general population within that year as the denominator, whereas overall incidence was calculated from 2000 to 2021.innovation programme and European Federation of Pharmaceutical Industries and Associations (EFPIA).IMI supports collaborative research projects and builds networks of industrial and academic experts in order to boost pharmaceutical innovation in Europe.The views communicated within are those of OPTIMA.Neither the IMI nor the European Union EFPIA or any Associated Partners are responsible for any use that may be made of the information contained herein.The study funders had no role in the conceptualisation, design, data collection, analysis, interpretation of data, decision to publish, or preparation of the manuscript.Additionally, there was partial support from the Oxford NIHR Biomedical Research Centre.The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Figure 1 .
Figure 1.Annualised crude incidence rates for breast cancer stratified by database and sex.

Figure 2 .
Figure 2. Annualised crude incidence rates for females stratified by database and age group.

Figure 3 .
Figure 3. Annualised crude prevalence stratified by database and sex.

Figure 4 .
Figure 4. Annualised crude prevalence for females stratified by database and age.

Figure 5 .
Figure 5. Annualised crude prevalence for males stratified by database and age.

Figure 6 .
Figure 6.Kaplan-Meier survival curve of breast cancer stratified by database and sex.

Figure 7 .
Figure 7. Kaplan-Meier survival curve of breast cancer stratified by sex and calendar year of diagnosis.

Table 1 .
Baseline characteristics of breast cancer patients at time-of-diagnosis stratified by sex from CPRD GOLD.IQR: interquartile range.

Table 2 .
Overall crude incidence rates of breast cancer, stratified by age and sex in CPRD GOLD/ Aurum.

Table 3 .
Overall survival rates after breast cancer diagnosis stratified by database age and sex.